NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Transcriptome Assembly at Single-Cell Resolution with Beaver

Shi, Qian; Zhang, Qimin; Shao, Mingfu (April 2025, Oxford University Press)

Free, publicly-accessible full text available April 24, 2026
Investigating urban-scale building thermal resilience under compound heat waves and power outage events based on urban morphology analysis

https://doi.org/10.1016/j.buildenv.2025.112747

Shi, Qian; Luo, Wensen; Xiao, Chao; Wang, Julian; Zhu, Han; Chen, Xin (May 2025, Building and Environment)

Free, publicly-accessible full text available May 1, 2026
Dual impacts of solar-reflective façades in high-density urban areas on building energy use and outdoor thermal environments

https://doi.org/10.1016/j.enbuild.2024.114926

Chen, Chenshun; Wang, Julian; Zhang, Huijin; Xu, Xinyue; Hinkle, Laura Elizabeth; Chao, Xiao; Shi, Qian (December 2024, Energy and Buildings)

Full Text Available
Review of Sustainable Urban Planning and Design Policy Interventions for Heatwave Management in Urban Environments

https://doi.org/10.52202/077496-0031

Zhang, Huijin; Wang, Nan; Shi, Qian; Xiao, Chao; Wang, Julian (October 2024, American Solar Energy Society)

Climate change leads to frequent extreme temperature events, making cities vulnerable to severe heatwaves. Therefore, this study aims to provide a systematic and overarching review of the urban planning and design policy interventions for heatwave management. This study used a series of key terms to search for relevant studies in three databases, including Web of Science, ScienceDirect, and Wiley, and then identified 28 articles published between 2007 and 2023 after several inclusion and exclusion criteria. After a systematic review, 15 policy interventions for heatwave management were summarized from the built environment level and building level. Cooling mechanisms and the scope of application were discussed. The results of this study provide policymakers with comprehensive guidance on sustainable urban design and planning for heatwave management.
more » « less
Full Text Available
Accurate assembly of circular RNAs with TERRACE

https://doi.org/10.1101/gr.279106.124

Zahin, Tasfia; Shi, Qian; Zang, Xiaofei Carl; Shao, Mingfu (September 2024, Genome Research)

Circular RNA (circRNA) is a class of RNA molecules that forms a closed loop with their 5′ and 3′ ends covalently bonded. CircRNAs are known to be more stable than linear RNAs, have distinct properties and functions, and are promising biomarkers. Existing methods for assembling circRNAs heavily rely on the annotated transcriptomes, hence exhibiting unsatisfactory accuracy without a high-quality transcriptome. We present TERRACE, a new algorithm for full-length assembly of circRNAs from paired-end total RNA-seq data. TERRACE uses the splice graph as the underlying data structure that organizes the splicing and coverage information. We transform the problem of assembling circRNAs into finding paths that “bridge” the three fragments in the splice graph induced by back-spliced reads. We adopt a definition for optimal bridging paths and a dynamic programming algorithm to calculate such optimal paths. TERRACE features an efficient algorithm to detect back-spliced reads missed by RNA-seq aligners, contributing to its much-improved sensitivity. It also incorporates a new machine-learning approach trained to assign a confidence score to each assembled circRNA, which is shown to be superior to using abundance for scoring. On both simulations and biological data sets, TERRACE consistently outperforms existing methods by a large margin in sensitivity while achieving better or comparable precision. In particular, when the annotations are not provided, TERRACE assembles 123%–413% more correct circRNAs than state-of-the-art methods. TERRACE presents a significant advance in assembling full-length circRNAs from RNA-seq data, and we expect it to be widely used in future research on circRNAs.
more » « less
Full Text Available
Accurate assembly of multiple RNA-seq samples with Aletsch

https://doi.org/10.1093/bioinformatics/btae215

Shi, Qian; Zhang, Qimin; Shao, Mingfu (June 2024, Bioinformatics)

Abstract MotivationHigh-throughput RNA sequencing has become indispensable for decoding gene activities, yet the challenge of reconstructing full-length transcripts persists. Traditional single-sample assemblers frequently produce fragmented transcripts, especially in single-cell RNA-seq data. While algorithms designed for assembling multiple samples exist, they encounter various limitations. ResultsWe present Aletsch, a new assembler for multiple bulk or single-cell RNA-seq samples. Aletsch incorporates several algorithmic innovations, including a “bridging” system that can effectively integrate multiple samples to restore missed junctions in individual samples, and a new graph-decomposition algorithm that leverages “supporting” information across multiple samples to guide the decomposition of complex vertices. A standout feature of Aletsch is its application of a random forest model with 50 well-designed features for scoring transcripts. We demonstrate its robust adaptability across different chromosomes, datasets, and species. Our experiments, conducted on RNA-seq data from several protocols, firmly demonstrate Aletsch’s significant outperformance over existing meta-assemblers. As an example, when measured with the partial area under the precision-recall curve (pAUC, constrained by precision), Aletsch surpasses the leading assemblers TransMeta by 22.9%–62.1% and PsiCLASS by 23.0%–175.5% on human datasets. Availability and implementationAletsch is freely available at https://github.com/Shao-Group/aletsch. Scripts that reproduce the experimental results of this manuscript is available at https://github.com/Shao-Group/aletsch-test.
more » « less
SlipO ₂ Chip – single-cell respiration under tuneable environments

https://doi.org/10.1039/D4LC00420E

Cui, Yuan; Moreira, Milena_De Albuquerque; Whalen, Kristen E; Barbe, Laurent; Shi, Qian; Koren, Klaus; Tenje, Maria; Behrendt, Lars (October 2024, Lab on a Chip)

In disciplines like toxicology and pharmacology, oxygen (O₂) respiration is a universal metric for evaluating the effects of chemicals across various model systems, including mammalian and microalgal cells.
more » « less
Full Text Available
Learning locality-sensitive bucketing functions

https://doi.org/10.1093/bioinformatics/btae228

Yuan, Xin; Chen, Ke; Li, Xiang; Shi, Qian; Shao, Mingfu (June 2024, Bioinformatics)

Abstract MotivationMany tasks in sequence analysis ask to identify biologically related sequences in a large set. The edit distance, being a sensible model for both evolution and sequencing error, is widely used in these tasks as a measure. The resulting computational problem—to recognize all pairs of sequences within a small edit distance—turns out to be exceedingly difficult, since the edit distance is known to be notoriously expensive to compute and that all-versus-all comparison is simply not acceptable with millions or billions of sequences. Among many attempts, we recently proposed the locality-sensitive bucketing (LSB) functions to meet this challenge. Formally, a (d1,d2)-LSB function sends sequences into multiple buckets with the guarantee that pairs of sequences of edit distance at most d1 can be found within a same bucket while those of edit distance at least d2 do not share any. LSB functions generalize the locality-sensitive hashing (LSH) functions and admit favorable properties, with a notable highlight being that optimal LSB functions for certain (d1,d2) exist. LSB functions hold the potential of solving above problems optimally, but the existence of LSB functions for more general (d1,d2) remains unclear, let alone constructing them for practical use. ResultsIn this work, we aim to utilize machine learning techniques to train LSB functions. With the development of a novel loss function and insights in the neural network structures that can potentially extend beyond this specific task, we obtained LSB functions that exhibit nearly perfect accuracy for certain (d1,d2), matching our theoretical results, and high accuracy for many others. Comparing to the state-of-the-art LSH method Order Min Hash, the trained LSB functions achieve a 2- to 5-fold improvement on the sensitivity of recognizing similar sequences. An experiment on analyzing erroneous cell barcode data is also included to demonstrate the application of the trained LSB functions. Availability and implementationThe code for the training process and the structure of trained models are freely available at https://github.com/Shao-Group/lsb-learn.
more » « less
Seeding with minimized subsequence

https://doi.org/10.1093/bioinformatics/btad218

Li, Xiang; Shi, Qian; Chen, Ke; Shao, Mingfu (June 2023, Bioinformatics)

Abstract MotivationModern methods for computation-intensive tasks in sequence analysis (e.g. read mapping, sequence alignment, genome assembly, etc.) often first transform each sequence into a list of short, regular-length seeds so that compact data structures and efficient algorithms can be employed to handle the ever-growing large-scale data. Seeding methods using kmers (substrings of length k) have gained tremendous success in processing sequencing data with low mutation/error rates. However, they are much less effective for sequencing data with high error rates as kmers cannot tolerate errors. ResultsWe propose SubseqHash, a strategy that uses subsequences, rather than substrings, as seeds. Formally, SubseqHash maps a string of length n to its smallest subsequence of length k, k < n, according to a given order overall length-k strings. Finding the smallest subsequence of a string by enumeration is impractical as the number of subsequences grows exponentially. To overcome this barrier, we propose a novel algorithmic framework that consists of a specifically designed order (termed ABC order) and an algorithm that computes the minimized subsequence under an ABC order in polynomial time. We first show that the ABC order exhibits the desired property and the probability of hash collision using the ABC order is close to the Jaccard index. We then show that SubseqHash overwhelmingly outperforms the substring-based seeding methods in producing high-quality seed-matches for three critical applications: read mapping, sequence alignment, and overlap detection. SubseqHash presents a major algorithmic breakthrough for tackling the high error rates and we expect it to be widely adapted for long-reads analysis. Availability and implementationSubseqHash is freely available at https://github.com/Shao-Group/subseqhash.
more » « less
Why Students Choose STEM: A Study of High School Factors That Influence College STEM Major Choice

https://doi.org/10.18260/1-2--44049

Main, Joyce; Dang, Tram; Johnson, Beata; Shi, Qian; Guariniello, Cesare; Delaurentis, Daniel (June 2023, ASEE Conferences)

Full Text Available

« Prev Next »

Search for: All records